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ABSTRACT 

The  purpose  of  this  paper  is  to  compare  analytically  the  properties 
of  the  subopt imal  dual  adaptive  stochastic  control  algorithm  when  the 
plant  dynamics  contain  multiplicative  white  noise  parameters.  A  simple 
scalar  example  is  used  for  this  analysis. 
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1.  INTRODUCTION 

Most  stochastic  optimal  control  problems  are  not  amenable  to  a  solution 
through  the  stochastic  dynamic  programming  equation.  This  is  so  because 
of  the  "curse  of  dimensionality."  The  need,  therefore,  naturally  arises 
for  suboptimal  algorithms.  Those  suboptimal  algorithms  should,  however, 
share  desirable  qualitative  features  with  the  optimal  controls.  The  study 
of  simple  examples  of  discrete-time  linear  systems  with  quadratic  cost 
and  multiplicative  noise  indicates  two  consequences  of  parameter  uncertainty 
on  the  optimal  control  law.  On  the  one  hand,  the  presence  of  uncertainty 
in  the  parameter  has  a  stimulating  action  on  the  control  because  a  control 
exercised  at  a  given  time  can  improve  the  accuracy  of  future  parameter 
estimates.  This  effect  has  been  called  loosely  the  probing  effect  of  the 
control.  On  the  other  hand,  the  presence  of  uncertainty  which  cannot  be 
reduced  by  the  control  has  an  inhibitory,  loosely  called  the  caution, 
effect  of  the  control;  the  larger  those  irreducible  uncertainties,  the 
more  attenuated  the  control  should  be.  None  of  these  consequences  of 
uncertainties,  the  so-called  dual  effect,  are  captured  by  the  naive 
"certainty  equivalent"  (CE)  control  law,  which  is  obtained  by  setting  all 
random  parameters  to  their  a  priori  mean  values  and  treating  the  system 
as  deterministic. 

In  the  more  general  cases,  wide-sense  dual  adaptive  algorithms  have 
been  suggested  ([1] ,  [2],  [3]).  The  crux  of  those  adaptive  algorithms  is 
to  approximate  the  cost-to-go  in  the  dynamic  programming  equation  by 
expanding  it  about  a  nominal  trajectory  to  second-order  terms  in 
perturbations  resulting  from  random  disturbances .  The  resulting  cost. 


called  the  dual  cost,  is  minimized  to  yield  the  suboptimal  control  at  the 
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corresponding  time  stage.  It  has  been  observed  by  simulations  [2],  [5] 
that  the  algorithms  displayed  the  desirable  caution  and  probing  features. 
Moreover,  it  has  been  claimed  [5]  that  the  dual  cost  could  be  decomposed  in 
a  sum  of  terms  which  account  respectively  for  the  caution  effect,  the 
probing  effect  and  the  deterministic  part  of  the  cost. 

In  general,  however,  it  is  impossible  to  compare  such  dual  control  laws 
with  the  optimal  one,  which  is  unknown,  in  the  case  of .constant  but  unknown 
parameters.  We  consider  here  a  different  special  case  of  a  scalar,  discrete¬ 
time  linear  system  with  white  multiplicative  gaussian  noise  and  perfectly 
observed  state.  The  optimal  control  law  of  such  systems,  for  a  quadratic 
performance  index,  is  known  [4],  [7].  We  show  that,  in  that  special  case, 
it  is  possible  to  explicitly  derive  the  dual  cost  and  the  dual  control  in 
closed  form,  when  the  length  of  the  planning  horizon  goes  to  infinity. 

The  dual  cost  on  am  infinite  horizon  is  always  finite,  provided  a  simple 
controllability  and  positive  definiteness  assumption  holds.  This  is  a 
qualitative  difference  with  respect  to  the  optimal  cost,  which  has  been 
shown  [4]  to  be  infinite  on  an  infinite  horizon,  unless  some  inequality 
is  satisfied  by  the  covariances;  that  property  has  been  referred  to  as  the 
uncertainty  threshold  principle.  Thus,  the  dual  control  fails  to  exhibit 
that  property. 

Some  valuable  insight  can  be  obtained,  since  we  show  that  the  asymptotic 
(i.e.,  infinite  horizon)  dual  control  law  is  in  fact  equivalent  to  a  first- 
order  expansion  of  the  optimal  control. law  for  systems  with  white  parameters 
as  a  function  of  the  parameter  covariances,  about  the  nominal  value  of  null 
parameter  covariances,  which  corresponds  to  a  deterministic  problem.  Since 
the  certainty-equivalent  (CE)  control  is  simply  a  zero-order  approximation. 


the  dual  control  is  shown  to  be  intermediate  (optimal  to  linear  terms) 
between  the  CE  and  the  optimal  control.  It  is  also  understandable  why 
the  uncertainty  threshold  principle  cannot  be  captured  by  the  dual  control, 
because  it  is  an  effect  which  is  essentially  nonlinear  (quadratic  and  higher 
order  terms)  in  the  covariances.  The  accuracy  of  the  dual  control  law  for 
small  parameter  covariances  is  quite  surprising,  as  no  learning  can  take 
place  in  this  problem,  due  to  the  white-noise  parameter  assumption.  In  other 
words,  if  the  system  parameters  have  small  standard  deviations  about  their 
mean  values,  we  demonstrate  by  means  of  a  scalar  example  that  the  dual  con¬ 
trol'  is  (to  first  order  linear  terms  in  the  parameter  standard  deviations) 
identical  to  the  white-parameter  optimal  control  law,  which  involves  no 
learning.  One  can  argue  both  ways  whether  this  is  "good  news  or  bad  news". 
The  "good  news"  is  that  if  the  system  parameters  are  not  very  random,  then 
the  inherent  "robustness"  properties  of  feedback,  modulated  correctly  for 
parameter  uncertainty,  require  no  detailed  "learning"  of  the  parameters, 
provided  that  certain  "caution"  is  exercised  (this  is  not  what  the  certainty- 
equivalence  principle  states) .  The  "bad  news"  is  that  the  dual  control 
algorithm  does  not  seem  to  capture  the  required  "caution”  effects  when  the 
system  parmeters  cure  very  uncertain  and  very  weakly  correlated  in  time. 

By  the  above  comments  we  do  not  mean  to  imply  any  criticism  of  the 
dual  adaptive  control  algorithm.  It  represents  an  excellent  contribution 
to  the  state  of  the  art  in  the  field  of  stochastic  adaptive  control,  and 
(again  loosely  speaking)  it  represents  an  intermediate  approach  to  the 
control  of  systems  with  random  parameters,  somewhere  between  the  case  of 
perfect  parameter  knowledge  assumptions  (the  certainty-equivalence  case) 
and  the  (unrealistic)  case  that  no^  learning  of  the  system  parameters  is 
possible  (the  white  multiplicative  parameter  case) .  What  the  authors 
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attempt  to  do  in  this  paper,  by  means  of  the  simplest  possible  scalar 
example,  is  to  understand  some  of  the  theoretical  properties  of  the  dual 
control  algorithm.  Thus,  the  reader  should  expect  only  a  relatively  minor 
theoretical  contribution;  by  no  means  we  imply  any  superiority  of  any 
stochastic  adaptive  control  scheme  that  is  useful  for  practical  designs. 

The  entire  field  of  adaptive  control  has  not  yet  matured  to  the  point  that 
can  provide  the  engineering  designer  with  useful  instructions  on  how  to 
realize  an  adaptive  control  system. 

Another  contribution  of  this  paper  is  to  examine  the  structure  of 
the  stochastic  cost  to  go.  In  the  dual  control  method  the  cost  is  split 
into  three  parts,  the  deterministic  cost,  the  caution  cost,  and  the 
probing  cost.  One  would  suspect  that  the  probing  part  of  the  cost  would 
correspond  to  the  active  learning  of  the  unknown  parameters,  and  that  it 
would  be  zero  in  this  example  with  multiplicative  white  parameters.  How¬ 
ever,  the  splitting  of  the  dual  cost  between  a  caution  and  a  probing  term 
fails  to  have  an  appealing  meaning.  Both  terms  combine  to  yield  a  sum 
of  positive  weightings  of  the  one-step  predictions  of  the  state  covariances. 
Thus,  no  distinction  can  be  made  between  uncertainties  that  can  or  cannot 
be  influenced  by  the  control. 

In  Section  2,  the  control  problem  is  introduced.  Section  3  presents 
its  optimal  solution  on  a  finite  horizon  and  discusses  its  existence  on 
an  infinite  horizon,  which  is  governed  by  the  uncertainty  threshold 
principle.  In  Section  4,  the  dual  adaptive  control  algorithm  is  applied 
to  the  problem  of  concern.  A  closed  form  expression  for  the  dual  cost 
is  derived  and  it  is  proved  that  it  remains  finite  on  an  infinite  horizon. 


-5- 


under  mild  assumptions.  In  section  5,  the  comparison  between  the  optimal, 
dual  and  certainty-equivalent  law  is  performed  for  the  infinite-horizon 
case.  In  section  6,  the  decomposition  of  the  dual  cost  is  examined.  Section 
7  contains  the  conclusion. 


2.  A  Scalar,  Multiplicative  White-Noise  Control  Problem 

The  simple  discrete-time  stochastic  control  problem  which  will  be 
considered  here  is  the  following: 

x (k+1)  =  a(k)x (k)  +  b(k)u(k)  (2.1) 


y (k)  =»  x  (k) 


The  state  x(k)  is  scalar  and  perfectly  observed,  without  observation  noise. 
There  is  also  no  additive  process  noise.  The  time  constant  a(k)  and  the 
control  gain  b(k)  cure  unknown  parameters.  They  are  independent  from  one 
stage  to  another;  namely,  they  constitute  a  white  noise  sequence.  In 
addition,  they  are  assumed  to  be  gaussian,  with  means  a,  b  and  covariance 
matrix 


r  aa  ab 

~  Z  Z 

ab  bb 

*  m 

Therefore, 

E{  [a (k)  -  a]  [b(j)-b]} 
E{ [a(k)  -  a]  [a ( j )  -  a] } 
E{ [b(k)  -  b]  [b(j)  -  b] } 


e  s.„ 

aa  ]k 


EU16.. 

bb  }k 


(2.2) 


(2.3) 


where  5.,  is  the  Kronecker  delta. 

The  initial  state  x(0)  is  known  and  the  cost  function  is  quadratic: 
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J  =  E  S  [Q  x2 (k)  +  Ru2 (k) ]  +  Qx2(N)  (2.4) 

(  k=0  ) 

This  stochastic  control  problem  is  one  of  the  few  which  yield  themselves 
to  a  closed-form  analytical  solution  [4] .  On  the  other  hand,  its  structure 
is  simple  enough  so  that  the  dual  cost  can  be  expressed  in  closed-form, 
too,  at  least  when  the  horizon  length  N  is  infinite.  This  makes  a  com¬ 
parison  possible  between  the  optimal  solution  and  the  suboptimal  solutions 
obtained  from  either  the  dual  adaptive  algorithm  [2]  or  the  certainty 
equivalent  strategy. 


3.  Optimal  Control:  Finite-  and  Infinite-Horizon  Cases 


3.1  Finite  horizon 

Because  of  the  Gaussian  character  of  the  random  parameters,  their 
probability  distribution  is  entirely  characterized  by  its  first-  and 
second-order  moments.  If  a  and  b  denote  the  expectations  of  a  and  b 


respectively,  the  optimal  control  law  is  found  [4]  to  be  the  following 


linear  feedback  law: 


uQpT(j)  =  -  G(j)x(j)  (3.1) 


where 

K(j+1)  (I  +  a  b) 

G(j)  =■  - — - —  (3.2) 

R  +  K(j  +  1)  +  b  , 

and  the  scalars  K(j  +  1)  are  given  by  a  backward  recursion  of  Riccati 


type: 
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_2  K2(j  +  1)  <E  +  a  b)2 

K ( j )  =  Q  +  K( j  +  1)  (E  +  a  ) - —5-  (3.3) 

aa  R  +  K(j  +  1)  (Ibb  +  ) 

K(N)  *  0 

The  optimal  cost  is  equal  to: 

j*  »  K(0)x2(0)  (3.4) 


Each  K(j)  can  be  viewed  as  a  function  of  I  ,  E  ,  ,  E..  .  The  deter- 

aa  ab  bb 

ministic  problem,  where  a  and  b  are  known  parameters,  corresponds  to  the 

values  E  *  E  .  -  E.  ,  »  0.  It  cam  be  verified  indeed  that,  upon  setting 
aa  ab  bb 

the  covariances  to  zero  in  (3.3)  and  (3.2),  the  solution  of  the  deterministic 
linear-quadratic  problem  is  obtained.  The  certainty-equivalent  (CE)  control 
strategy  consists  of  replacing  the  unknown  parameters  by  their  current 
estimates  and  then  solving  the  corresponding  deterministic  control  problem. 

In  the  present  problem,  because  of  the  white-noise  property,  the  best  cur¬ 
rent  estimates  of  a  and  b  are  their  a  priori  means  a  and  b  ,  since  no  learning 
is  possible.  Therefore, the  certainty-equivalent  strategy  amounts  to  setting 

E  =*  E  ,  =  E^^  =  0  in  (3.1),  (3.2),  (3.3).  It  is  a  zero-order  approxi- 
aa  ab  bb 

mation  of  K(j)  as  a  function  of  E_  about  E_  *  0.  Intuitively,  the  certainty- 
equivalent  law  will  become  increasingly  poorer  as  the  covariances  depart 
further  from  zero,  i.e.,  as  the  problem  becomes  more  stochastic.  This  is 
evidently  always  so,  but  it  will  be  seen  below  that  the  same  remark 
applies  to  the  dual  adaptive  control  law  as  well. 


3.2  Infinite  horizon;  the  Uncertainty  Threshold  Principle 


When  the  time  horizon  N  goes  to  infinity,  it  earn  be  shown  [4]  that 
the  optimal  cost  J*  need  not  remain  bounded.  In  fact,  a  necessary  and 
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sufficient  condition  for  the  cost  J*  to  go  to  a  finite  limit  when  N 
that  the  following  inequality  between  the  covariances  should  hold: 


E 

aa 


+ 


_  -7 

a  “ 


(E  +  a  b)  2 
ab 


Zbb  + 


<  1 


+  ®  is 


(3.5) 


The  left-hand  side  of  (3.5)  has  been  called  the  uncertainty  threshold,  and 
the  property,  the  uncertainty  threshold  principle.  This  is  an  essentially 
nonlinear  result,  which  states  that,  if  the  covariance  matrix  E_  lies  out¬ 
side  of  a  certain  region,  the  asymptotic  infinite  horizon  problem  is  ill- 
posed.  This  is  in  sharp  contrast  with  the  deterministic  problem  where, 
under  mild  controllability  and  positive-definiteness  assumptions,  the 
optimal  cost  reaches  a  finite  limit  as  N  +  ®.  In  the  present  problem,  those 
assumptions  cure: 


b  +  0;  Q  >  0,  R  >  0. 


(3.6) 


It  is  seen  that,  for  Eaa  -  E^  =  E^  =  0,  the  left-hand  side  of  (3.5)  is  equal 
to  zero,  so  that  (3.5)  is  satisfied. 

When  the  inequality  (3.5)  is  satisfied,  the  limit  of  K(j)  for  N  ■+■  30 
is  the  solution  of  the  algebraic  equation  corresponding  to  (3.3),  namely: 

-2  K2(E  +Ib)2 

K  =  Q  +  K(E  +  a  ) - —  (3.7) 

R  +  K(E  +  b  ) 
bb 

Inequality  (3.5)  is  also  necessary  and  sufficient  in  order  for  the 
algebraic  equation  (3.7)  to  have  a  unique  positive  solution  [4].  This 
solution  will  be  denoted  K.  It  is  in  fact  a  function  of  the  covariance 
matrix  E: 


K  -  K(E)  . 


An  alternative  way  of  stating  the  uncertainty  threshold  principle  is  there 
fore  as  follows. 


The  nonlinear  function  K(S)  is  defined  on  the  region  of  the  space  of 
described  by  (3.5),  and  it  approaches  infinitely  as  Z_ goes  to  the 
boundary  of  that  region.  Note  that  the  asymptotic  value  of  the  cost  in 
the  CE  strategy  is  obtained  from  the  value  of  K(S)  at  £  =  0,  just  as  in 
the  finite-horizon  case. 

4.  Dual  Adaptive  Control 

4.1  Expression  for  the  Dual  Cost 

In  this  section,  we  now  apply  to  the  stochastic  control  problem 
introduced  in  section  2  the  wide-sense  dual  adoptive  control  algorithm 
of  Tse  and  Bar-Shalom  [1],  [2].  This  algorithm  consists  of  approximating 
the  co st -to -go  from  step  k  +  1  on  in  the  dynamic  programming  equation; 
the  sum  of  the  cost  at  stage  k  and  of  this  approximated  cost-to-go,  called 
the  dual  cost,  is  minimized  with  respect  to  the  control  u(k)  to  yield 
the  dual  control  at  step  k.  The  approximation  of  the  cost-to-go  is 
carried  out  in  two  steps.  In  the  first  step,  the  enlarged  state  z_(k  +  1) 
consisting  of  the  initial  state  x (k)  and  the  random  parameter  is  esti¬ 
mated  at  time  (k+1)  from  the  information  available  at  time  k,  and  the 
optimal  cost  corresponding  to  a  deterministic  dynamical  system  is  cal¬ 
culated.  This  deterministic  dynamical  system  is  obtained  from  the  sto¬ 
chastic  one  by  setting  the  random  parameters  to  their  expectations. 

This  step  is  essentially  the  application  of  the  CE  control,  say  uQ(j), 
from  time  (k  +  1)  on.  It  results  in  a  nominal  trajectory  ( j )  (j  >_  k+1), 
and  a  nominal  estimate  JQ (k  +  1)  of  the  cost-to-go. 

In  a  second  step,  random  disturbances  5(j)  (for  j  >_  k  +  1)  sure  intro¬ 
duced;  they  cause  perturbations  5z(j)  of  the  nominal  states.  The  new 
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trajectory  is  described  by 

z_( j )  =  Zgtj)  +  i.£(j)  ;  (j  >_  k  +  1)  . 

Perturbation  controls  6u(j)  are  exercised  so  as  to  minimize  the  expected 

increment  in  the  cost,  AJ  (k+1) .  In  order  to  solve  that  minimization 

o 

problem,  the  state  perturbation  <$z_(j  +  1)  is  expanded  to  second-order 

terms  in  5z_(j)  and  <5u(j),  using  the  dynamical  eqution  about  the  nominal 

trajectory  ( j )  and  control  uQ(j).  The  cost  function  is  also  expanded 

to  second  order  about  the  nominal  trajectory. 

This  permits  the  evaluation  of  AJ  *(k  +  1),  the  minimum  of  AJ  (k+1), 

o  o 

The  wide-sense  dual  adaptive  control  a  time  k,  u. (k) ,  is  then  obtained 

a 

by  minimizing  over  the  input  u(k)  the  dual  cost  J_[u(k)],  namely  the  sum 

a 

of  the  one-step  cost  at  stage  k  and  the  approximation  of  the  cost-to-go 
from  stage  (k  +  1)  on: 

J  [u (k) ]  =  E{Qx2 (k)  +  Ru2 (k)  +  j  (k+1)  +  AJ* (k  +  1) I Y,  }  (4.1) 

a  o  o  k 

where  Y^  denotes  the  information  available  at  stage  k.  In  the  present 

problem,  Y^  can  be  described  by  the  sequences  x(i)  (i=0,l, . . . ,k-l)  and 

u(i)  (i =  0,1,...,  k-1) . 

In  the  problem  introduced  in  section  2  ,  the  enlarged  state  is  de¬ 
fined  by 


z  (k)  =  [x  (k) ,  a  (k) ,  b(k)] 


(4.2) 


Step  1  of  the  dual  adaptive  algorithm  sets  the  initial  state  at 
time  (k  +  1)  to  the  estimated  value,  given  the  information  Y^: 

x  (k+1)  =  x(k  +  ilk)  =  ax(k)  +  bu(k) 
o 


(4.3) 
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The  deterministic  version  of  the  dynamical  equation  (2.1)  is 

x  (j  +  1)  =  ax  (j)  +  bu  (j)  (j  =  k  +  1, . . .  ,N-1)  (4.4) 

o  o  o 

The  nominal  control  sequence  is  the  optimal  control  sequence  of  the 
associated  deterministic  problem  from  time  (k  +  1)  on: 


Uo(j)  =  ^o(j)Xo(j)  = 


K  (j  +  l)a  b 
° 


(4.5) 


R  +  K  (j  +  1)  b 
o 


and  KQ(j)  is  given  recursively  by  the  Riccati  difference  equation: 


K  (j)  =  Q  +  K  (j  +  l)a 
o  o 


K  (N)  =  0 
o 


-2 


—2  —2  —2 
K  (j  +  l)a  b 
o 

R  +  K  (j  +  l)b2 
o 


(4.6) 


Equation  (4.6)  is  in  fact  the  special  version  of  Eq.  (3.3)  corresponding 


to  E  =  E  .  =  E, ,  =  0.  The  initial  estimate  of  the  cost-to-go  is  given 

clcl  3LD  DD 


by: 


J  (k  +  1)  =  (1/2)K  (k  +  l)x  (k  +  1)  =  (1/2) K  (k  +  l)[3x(k)  +  bu(k)] 
o  o  o  o 


(4.7) 
2 


In  step  2  of  the  dual  adaptive  algorithm,  the  covariances  of  the  enlarged 

★ 

state  appear,  in  the  calculation  of  the  cost  perturbation  AJ  (k  +  1) . 

o 

The  updated  covariance  matrix  of  the  perturbation  c$z_(  j )  of  the  enlarged 
state  z_(j)  given  the  current  information,  along  the  nominal  trajectory. 


is: 


l  (j| j) 


aa 


"ab 


(4.8) 


ab  bb 

This  results  from  the  fact  that  the  state  x(k)  is  exactly  observed  and 


from  the  white  noise  assumption  on  a(k)  and  b(k) .  The  one-step  pre¬ 
dicted  covariance  of  the  perturbation  of  the  enlarged  state,  along  the 
nominal  trajectory,  is 


-12- 


0 

0 

Z_o  (j  +  l|j)  * 

0 

Saa 

Zab 

0 

£ab 

- 1 

s 

where 


Z  (j  +  1 
xx 


j) 


Z  X2(j) 

aa  o 


+  21  x  (j)u  (j)  +  Z  u  (j) 
ab  o  o  be  o 


(4.9) 


(4.10 


and  x  (j),  u  (j)  denote  the  nominal  trajectory  and  control,  as  obtained 
o  o 

in  step  1. 

Equation  (4.9)  also  results  from  the  white-noise  assumption  and  the 
perfect  observation  of  the  state.  From  the  expression  for  the  dual  cost 
as  given  in  Tse  et  al.  ([3],  Eq.  (3-12))  it  follows  that 


J,[u(k)]  =  (1/2 )Ru2 (k)  +  (1/2) K  (k+1)  x  (k+1 I k) 2  +  p  (k+l)x(k+l) 
d  o  o 


+  (1/2) tr 


N 


E  W(j)E  (j  J  j)  +  [E(k+l|k)  -  E  (k+1 1 k+1) ] K  (k+1) 


j=k+l 


N-l 


+  E  [E  (j+l|j)  -  E  (j+l|  j+1 )  ]K  (j+1) 

•  ,  .  -i  ~~o  — ° 

]=k+l 


(4.11. 


In  Eq.  (4.11),  ^(j)  is  a  matrix  which  has  the  dimension  of  the  enlarged 
state.  Denoting  the  random  parameters  by  the  vector 


9(k)  =  [a(k) ,  b(k)l  , 


(4.12) 


the  matrix  K  (j)  caui  be  partitioned  as 
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It  turns  out  [3]  that 


vv 

K  (j)  =  K  (j) 
o  o 


(4.14) 


where  KQ(j)  is  the  solution  of  the  Riccati  difference  equation  (4.6). 

X0  00 

The  matrices  K  (j)  and  K  (j)  can  be  obtained  from  recursions  [3]  once 
— o  — o 

XX 

the  sequence  Kq  (j)  is  known.  The  vector  ^(k+1)  is  zero  in  our  example 
because  we  deal  with  a  regulator,  not  a  tracking  problem.  Also,  Z_(k+l|k), 
the  one-step  predicted  covariance  of  the  enlarged  state  at  stage  k,  is 
given  by  (4.9)  with  j=k,  since 


x (k+l| k)  =  a  x(k)  +  bu(k) 


(4.15) 


In  eq.  (4.11),  the  matrix  W(j)  has  the  following  structure  (see  [31, 
Eq.  (3.17)): 


Q  v,  v2 


w(j)  =  I  V1  0  0 


y  2  0  0 


(4.16) 


The  exact  definition  of  V^,  V 2  is  unimportant  in  this  example,  be- 


(4.17) 


tr[W(j)£_o  (j  |  j)  ]  =trl|V1  0  0 


2  V1  V2 


v2  0  0 


On  the  other  hand, 

trj^tj+l)  jjMj+ljj)  -  ^(jj  j)j|=  tr 


0  0  0 

0  E  E  . 

aa  ab 

0  E  .  Euk 

ab  bb 


E^(j+l|j)  K^Cj+l)  0  0 


=  Z°  (j+l|  j)K*x(j  +  1)  =  E“  (j+lIj)K  (j+1) 


(4.18) 
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From  Eqs.  (4.11),  (4.14),  (4.15),  (4.17),  (4.18),  and  the  remarks  just 
made,  it  follows  that  the  dual  cost  is  given  as  follows  in  our  problem. 

J.[u(k)]  =  (1/2)  Ru2  (k)  +  (1/2)  [ax  (k)  +  bu(k)]2K  (k+1) 
d  o 

_  N-l  _  (4 

+  (1/2) K  (k+1) E  (k+1 | k)  +  (1/2)  Z  K  (j+l)E°  (j+l|j) 
o  xx  j=k+1  °  ** 


4.2  Infinite-Horizon  Case 

It  will  now  be  shown  that,  under  the  same  assumptions  which  guarantee 
the  finiteness  of  the  certainty-equivalent  cost  over  an  infinite  horizon, 
the  dual  cost  too  remains  bounded.  Therefore,  there  is  a  qualitative  dif¬ 
ference  between  the  dual  adaptive  control  and  the  optimal  control  in  the 
infinite-horizon  case:  the  former  does  not  obey  the  uncertainty  threshold 
principle,  which  governs  the  latter. 

Controllability  of  the  deterministic  dynamical  system  (4.4)  is 
equivalent  to  the  property  that  b  #  0.  Under  the  assumptions  (3.6) 

(b  j1  0,  2  >  0,  R  >  0) ,  it  is  well  known  [8]  that  the  solution  KQ(j)  of 

Riccati  recursion  (4.6)  reaches  a  finite  positive  limit  K  as  N  ■+  °°. 

o 

Hence,  if  u  (j),  x  (j)  are  respectively  successive  controls  and  states 
o  o 

of  the  certainty-equivalent  strategy, 


N-1  2  2  _  2 

lim  Z  [Qx  ( j )  +  Ru  (j)]  =  (1/2) K  x^ (k+1)  <  +  « 
j-k+1  ° 


(4.20) 


According  to  Eq.  (4.19),  we  must  prove  that 


N— 1  _ 

E  K  (j+l)Z  (j+l!  j)  remains  bounded  as  N  +  °°.  From  Eqs.  (4.5)  and 
....  o  xx 

j*k+l 


(4.10) 
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N-l  _ 

Z  K  (j+l)I°  (j+llj) 
j=k+l  ° 


(4.21) 


N-1  _  22 

Z  K  (j+1)  [E  -2  E  G  (j)  +  Z,  .G„(j)]x  (j) 
.  .  ,  o  aa  ao  o  be  o  o 

]=k+l 


In  fact,  both  K  (j  +  1)  and  G  (j)  depend  on  N:  let  us  emphasize  that 
o  o 

dependence  by  the  notation  KQ(j+l;  N) ,  GQ(j;N)  .  Clearly, 


K  (j+1;  N)  <  K  ( j ;  N )  (4.22) 

o  —  o 

since  the  left-hand  side  defines  the  minimal  cost  on  a  shorter  horizon. 
From  Eq.  (4.5), 


3K  (j+1)  [R+K  (j+1) b  2  ]2 
o  o 

Accordingly 

3G2 ( j )  3G  (j)  R (a b) 2  K  (j+1) 

_2 - =  2G  (j)  ^-2 - -  - 0 

3K  (j+1)  3k  (j+1)  [R  +  K  (j+1)  b  ^  ] 

o  o  o 

whence  it  follows  that,  also, 

G2(j+1;N)  <  G2(j  ;  N) 
or 

I  Gq  ( j+l;N)  |  <_  |Gq  ( j  ;N)  |  (4.23) 

From  (4.22)  and  (4.23), 

K  (j+1;  N)  <  K 
o  —  o 

and 


|GQ(j+l;N)|  <  |Gq| 
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where  G  is  obtained  from  K  by  the  same  function  which  yields  G  (j) 
o  o  o 

from  KQ(j).  (Eq.  (4.5)).  On  the  other  hand,  Z,  as  a  covariance  matrix, 
is  symmetric  and  positive  semi-definite.  Therefore, 

[E  -2E  WG  (j)  +  E  cs2(j)]  <  a[G2  (j)  +  1]  (4.24) 

aa  ab  o  bb  o  —  o 

where  a  is  the  largest  eigenvalue  of  E_.  As  a  result, 

N-l  _  ,  N-l 

E  K  (j+1)  E  (j+l|j)  <  K 0[G  +1]  Z  x  (j)  (4.25) 

j«k+l  °  **  °  j-k+1  ° 

However, 

N-l  -  2  -  2 

<_  lim  E  [Qx  (j)  +  Ru  (j)]  =  1/2  K  x  (k+1)  <  +  « 
j=k+l  °  ° 

Since  Q  >  0,  it  follows  that 

N-l 

lim  E  x  (j)  <  +  00 
N^°°  j*k+l  ° 

and  therefore,  the  left-hand  side  of  (4.25)  remains  finite  as  N  -*■  “. 

A  consequence  of  this  observation  is  that  there  will  be  an  important 
discrepancy  between  the  trajectory  resulting  from  the  application  of 
the  dual  control,  and  the  optimal  trajectory,  for  the  range  of  covariances 
which  do  not  obey  the  inequality  (3.5)  of  the  uncertainty  threshold 
principle.  This  qualitative  difference  is  confirmed  by  a  quantitative 
comparison  in  the  next  section. 

5.  Comparison  between  optimal  and  dual  control  in  the  infinite-horizon 
Case 

From  the  expression  (4.19)  for  the  dual  cost  and  the  knowledge  that 
it  remains  bounded  on  an  infinite  horizon  (section  4.2),  it  is  possible 


Q  lim 


N-l 

E 

j»k+l 


x2(j) 


to  obtain  a  closed-form  expression  for  the  limit  of  the  dual  cost  when 


N  goes  to  infinity,  in  terms  of  the  various  problem  data  and  the  limit 
Kq  of  the  solution  to  the  Riccati  recursion  (4.6).  This  in  turn  provides 
a  closed-form  expression  for  the  dual  adaptive  control,  which  can  there¬ 
fore  be  compared  with  the  optimal  control  as  given  by  Eqs.  (3.1),  (3.2), 
(3.3).  Let 

a(j;N)  -  Ko(j+l;N)  ( j+l|  j )  (5.1) 

We  are  interested  in  evaluating 
N 

L  =  lim  Z  a(j,N)  (5.2) 

»*»  j=k+l 

To  that  end,  we  use  Eq.  (4.22),  and  the  stability  of  the  closed-loop 
dynamical  system  of  the  certainty-equivalent  strategy.  Namely, 

xQ(j+l)  =  Ao<j)Xo(j)  (5.3) 

where 

A  (j)  *  a  *  b  G  (j) - — - =  (5.4) 

O  o  — 

R  +  KQ(j+l)b 

It  is  known  [8]  that,  under  the  assumptions  (3.6),  the  asymptotic  closed- 
loop  system  is  strictly  stables 

lain 

|A|  -  limjA  (j)  |  »  ■  <  1  (5.5) 

»**  R+K  b  z 

o 

From  eqs.  (4.21)  and  (5.3), 

2  j_1  2  2 

a ( j ,N)  -  K  <j+l)[I  -2E  G  (j)  +  E..G  (j)]  n  A  (i)x^(k+l)  (5.6) 

o  aa  ac  o  bb  o  .... 


where,  in  fact,  K  (j+1)  =  K  (j+l;N),  G  ( j )  -  G  (j;N),  A (i)  -  A (i;N) 
o  o  o  o 


However , 


N 

r  m 

L  = 

lim 

E 

a ( j ,N) 

=»  lim  I  E 

a(j 

,N) 

N+“ 

m™k+l 

»+«  Lj=k+1 

for  any 

m  £ 

(k+1. 

. . . ,N-l} . 

Therefore, 

also, 

■ 

m 

N 

1 

L  ■ 

lim 

lim 

E  a ( j ,N)  +  E  a(j,N) 

nr*40 

tf+oo 

j*k+l 

j=nH-l 

J 

But,  it 

has 

been 

shown  in 

section  4.2 

that 

N 

I  a 

j»nrt-l 


-] 


N 


lim  E  a(j,N)  <  + « 
N-"»  j-k 


for  all  k 


Accordingly, 


r  n  -I 

im  llim  Z  a(j,N) 
[»*»  j-m+1  J 


and 


m 


L  =■  lim  lim  Z  a(j,N)  =  lim  E  lim  a(j,N) 

m-H»  n-k»  j«k+l  nr*40  j«k+l 


.  f  ; 

im  Z 
r*40  L  j»k+ 


] 


(5.7 


From  (5.6),  using  the  convergence  of  K(j+1;N) ,  G(j;N)  and  D(j;N),  it  is 
concluded  that  a(j,N)  goes  to  a  limit  as  ft*40,  and 

■  \  •? 

(5.8 


lim  a  (j  ,N)  -  K  IZ  -  2  E  G  +  Z.  .  G^]  A2  (j"1“k)x2  (k+1) 
o  aa  an  o  dd  o 

N-*30 


where 


K  a  b 
o 


°  R  +  Kq  b2 


Accordingly 


(5.9 


L  -  K  [Z  -21,  G  +  Z  G2]  x2(k  +  1)  lia  Z  A2'3"1"10 
°  aa  ab  o  bb  o  0  ^  j=k+1 

Kn  2 

-  — ~  [Z  “2  Z  G  +  E,hG 

aa  ab  o  bo  o  o 

where  the  second  equality  results  from  | A |  <  1. 

In  summary,  taking  into  account  Eqs.  (4.3),  (4.19),  and  (5.9),  the 
asymptotic  value  of  the  dual  cost  is  arrived  at: 

lim  J. [u (k)  ]  =  (l/2)RU2(k)  +  (1/2 )K  [ax(k)  +  bu(k)]2 
d  o 

N-*» 

+  (1/2) K  [Z  x2 (k)  +  2Z  x (k)u (k)  +  Z  u(k)J  (5.1 

o  aa  ab  bb 

+  (1/2)  —2—.  [E  -  2G  Z  +  G  Z.  ]  [ax(k)  +  bu(k)] 

A  2  aa  o  ab  o  bb 
I -A 

The  minimization  of  the  asymptotic  dual  cost  (5.10)  with  respect  to 
u(k)  yields  the  stationary  dual  adaptive  control  law,  u  [x (k) ] . 


Uj[x(k)  ] 


K  Z  .  +  [K  +  -Zjt  (2  -2G  Z  .  +  G  Z  )  ]  a  b 

o  ab  o  1-A  aa  o  ab  o  bb _ 


K 

-2  k  +  —  ■  (Z 

t-b  o  1  _^2  aa 


2G  Z  +  G.  Z . .  ) 
o  ab  o  bb 


I  +  K 

c 


Comparison  of  Eq.  (5.11)  with  the  asymptotic  version  of  the  optimal  con 
trol  law  (3.2)  evidences  a  similar  structure.  However,  the  limit  of 
K(j+1)  as  which  occurs  in  (3.2)  -  if  it  exists  -  is  the  positive 
solution  of  Eq.  (3.7)  -  if  it  exists;  that  is,  K(Z) .  Recall  that  that 
limit  exists  if  and  only  if  Z  lies  within  a  region  defined  by  Eq.  (3.5). 


In  contrast,  the  parameter  Kq  which  occurs  in  (5.11)  is  always  defined, 
finite  and  positive,  and 


Z 


Z 


5.12 
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From  Eg.  (3.7),  the  gradient  of  K(Z)  with  respect  to  E,  evaluated  at  E=0, 
can  be  found  (see  appendix),  and  the  resulting  first-order  expression  of 
K(E)  about  £=0  is  accordingly  found: 


_  —  S  K 

K(Z)  =  Kq  +  (Z)|  Z  +  o(Z) 


(5.13) 


a  T  K 

~  (Z)  i  Z  =  —~a  -  2G  Z  +  GZ) 

oZ_  —  o—  ^_^2  aa  o  ab  o  bb 


(5.14) 


Hence,  the  expression  between  brackets  in  (5.11)  is  recognized  as  the 
first-order  expansion  of  K(Z)  in  about  Z_  =  0.  Note  that,  from  (5.14), 

3kT 

[Ko  +  3^—  (E) I^Z]  is  positive,  regardless  of  the  value  of  the  covariance 
matrix  Z_.  This  follows  from  (5.14),  the  fact  that  Kq  >  0,  1-A  >  0  and 

the  positive  semidefiniteness  of  Z.  The  stationary  optimal  control  law 

(from  Egs.  (3.1),  (3.2))  exists  in  the  neighborhood  of  Z=0  (because  Z=0  satisfied 
Eg.  (3.5)  and  by  continuity)  and  can  also  be  expanded  to  first  order  in 


UOPT*—  UOPT(0)  + 


3u 

OPT 

3E 


OPT  3K 
3k  o 


Z  +  o(E) 


(5.15) 


where  both  the  direct  dependence  of  uQpT  on  E_  and  the  indirect  dependence 
through  K(Z)  have  been  taken  into  account.  It  follows  from  (5.15)  and 
(5.13),  (5.14)  that 


ud<-}  ~  U0PT =  °(-) 


or 
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where  |  [eJ  |  is  the  euclidean  norm,  for  instance,  (see  the  appendix  for 
a  proof) . 

On  the  other  hand,  the  approximation  is  no  better  than  the  first 
order.  Indeed  (see  appendix),  the  second  derivatives  of  with  respect 

to  E  involve  the  second  derivatives  of  K(£)  ,  evaluated  at  1=0,  which  are 
not  present  in  the  dual  u^.  Therefore,  the  stationary  dual  control  (on 
an  infinite  time-horizon)  is  the  first-order  approximation  of  the  optimal 
control,  as  a  function  of  the  covariance  matrix  E_,  about  the  numerical 
value  E_=0  which  corresponds  to  a  deterministic  problem.  It  has  already 
been  pointed  out  (section  3.1)  that  the  certainty-equivalent  control  is 
a  zeroth-order  approximation,  in  the  sense  that 


uce(£> 


u  (E) 
OPT  — 


1=0 


This  is  apparent  from  Eq.  (4.5).  Thus,  the  result  of  this  section  shows 
that,  in  our  particular  problem  and  for  an  infinite  horizon,  the  dual 
control  performs  better  than  the  CE  control,  but  less  well  them  the 
optimal  one.  The  accuracy  of  the  dual  control  can  be  quite  high  for  snail 
covariances,  which  is  somewhat  surprising  in  view  of  the  fact  that  the 
parameter  cannot  be  learned,  due  to  the  white -noise  property. 

When  the  parameter  covariances  grow  large,  however,  the  discrepancy 
between  the  dual  and  the  optimal  control  can  become  substantial.  This  is 
confirmed  for  instance  by  the  consideration  of  limiting  cases.  Assume, 
for  instance,  that  a  and  b  are  uncorrelated,  with  the  variance  of  a 
being  fixed.  For  E+*,  K(E)  goes  to: 

DD  — 


K(E  ) 

aa 


1  -  a2  -  E 


aa 
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as  is  apparent  from  Eq.  (3.7).  The  inequality  (3.5)  to  be  satisfied  by 
the  covariances  is 


E  +  a2  <  1 

aa 


The  optimal  control  law  (3.2)  goes  to  zero  when  Ebfa  goes  to  infinity. 
This  is  an  example  of  caution  effect:  the  control  is  inhibited  by  un¬ 
certainties  that  it  cannot  affect.  In  constrast,  the  dual  control  law 
u .  (k)  goes  to  a  finite  limit 

U 


lim  u  (k) 


a  b 


h2+  (1"A  } 

b  s2 

O 


-x  (k) 


Hence,  one  can  say  in  that  case  that  the  dual  law  is  not  cautious  enough. 
The  fit  can,  however,  sometimes  be  better,  even  at  large  values  of  the 
covariances.  For  instance,  in  another  limiting  case  where  a  and  b  are 
still  uncorrelated,  but  E. .  remains  fixed  and  E  both  laws  have  the 

OD  da 


same  limit: 


lim  u„_(k)  ■  lim  u.  (k)  *  -  —  x  (k)  . 

z  ~  0PT  b 

aa 


6.  Decomposition  of  the  Dual  Cost 

A  decomposition  of  the  dual  cost  for  the  general  discrete  stochastic 
control  problem  with  quadratic  cost,  linear  dynamical  equations  and 
linear  evolution  equation  for  the  random  parameters  has  been  proposed 
in  the  literature  [51.  This  decomposition  splits  the  dual  cost  into  a 
deterministic  term,  a  "caution"  term  and  a  "probing"  term.  The  de- 


terministic  term  (k)  represents  the  value  of  the  cost-to-go  corresponding 
to  the  certainty-equivalent  strategy,  namely,  it  depends  on  the  unknown 
coefficients  only  through  their  current  estimated  expectations. 

The  caution  term  Jc (k)  is  supposed  to  reflect  those  uncertainties 
that  the  control  at  stage  k  cannot  affect  directly,  although  it  can 
affect  their  weightings.  Those  include  the  one-step  predicted  covariance 
of  the  enlarged  state  at  stage  k,  and  the  covariance  of  the  noise  of 
the  enlarged  state. 

The  probing  term  J  (k)  contains  those  uncertainties  which  the  control 

p 

at  stage  k  can  influence;  those  include  the  future  updated  covariances. 

In  our  problem  however,  the  updated  covariance  matrices  of  future  states 
are  all  equal  to  the  a  priori  covariance  matrix  of  the  paramters  a,  b, 
because  of  the  white-noise  property,  so  that  they  cannot  be  influenced 
by  the  control.  The  various  components  of  the  dual  cost  are  as  follows 
[5],  16]; 


J  (k)  =  (1/2) Ru2 (k)  +  (1/2) [ax (k)  +  bu(k)]2K  (k+1) 

D  O 

N-l 

Jc(k)  =  (l/2)S0(k+l)  1“^+  (1/2)  tr  (K®®LZ°I®e) 

0  0  ^  y 

and  J  (k)  is  given  [5]  as  a  function  of  K.  ,  and  K.  ,  for  j 
P  -l+l  D+l 

The  dual  cost  is  the  sum  of  the  three  terms: 


(6.1) 

(6.2) 


k+1, . . . ,N-1. 


J.(k)  =  J„(k)  +  j  (k)  +  J  (k)  .  (6.3) 

d  0  c  p 

99  o  ' 

Using  the  recursions  [3]  satisfied  by  K.+1,  and  Eq.  (4.10)  for  ZjQ£(j+l|j), 
it  is  possible  to  verify  that  (6.3)  is  consistent  with  (4.19).  However, 
the  caution  and  probing  terms  combine,  in  our  example,  to  yield: 
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(6.4) 


N-l 


J  (k)  +  J  (k)  =  (1/2 ) K  (k+l)E  (k+1  k)  +  (1/2)  Z  K  (j+l)E  (j+l,j) 
c  p  O  XX  .  ,  O  XX 

3=k+l 


In  view  of  Eq.  (4.10),  it  is  clear  that  the  control  u(k)  can  affect  both 


Exx(k+l|k)  and  E  (j+l|j),  for  j  >_  k+1,  but  it  cannot  affect  the 


coefficients  K  (i)  (i=k+l, . . . ,N) .  Hence,  the  decomposition  into  (6.1) 


and  (6.2)  does  not  seem  to  have  any  intuitive  appeal  in  the  present 
situation. 

Perhaps,  another  splitting  of  the  cost  would  be  more  appropriate. 


where  the  nondeterministic  part  of  the  cost,  J^-Jp,  'would  be  expressed 


as  the  sum  of  one  term  which  corresponds  to  the  open-loop  feedback 
strategy  [7] ,  and  the  difference. 

In  conclusion,  it  seems  that,  even  though  the  dual  algorithm  is 
very  near  optimality  for  small  covariances  (Section  5) ,  its  action  cannot 
be  explained  by  the  decomposition  between  probing  and  caution  in  the 
present  scalar  example. 


7.  Conclusion  and  Suggestions  for  Future  Work 

The  motivation  for  this  analysis  has  been  the  desire  to  gain  more 
insight  into  the  behavior  of  the  wide -sense  dual  control  algorithm  [1] , 

[2] ,  whose  available  results  so  far  arise  from  simulations.  Those 
results  are,  of  necessity,  qualitative  rather  than  quantitative  because 
a  comparison  of  the  adaptive  control  with  the  optimal  control  is  usually 
impossible  since  the  latter  is  unknown.  An  attempt  towards  the  quanti¬ 
zation  of  some  desirable  adaptive  features  possessed  by  the  dual  control  - 
probing  and  caution  -  was  made  recently  [5],  by  splitting  the  dual  cost 
into  component  terms  which  each  are  claimed  to  account  for  a  particular 
effect. 
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The  approach  that  we  have  taken  here  has  been  to  concentrate  on  a 
special  discrete  stochastic  control  problem  (quadratic  cost,  linear 
dynamics,  multiplicative  guassian  white  noise  with  perfectly  observed 
state)  where  the  optimal  control  is  known.  The  special  nature  of  the 
problem  makes  it  possible  to  evaluate  the  dual  control,  too,  in  closed 
analytical  form,  at  least  for  the  inf inite-hroizon  case.  This  permits 
a  thorough  comparison  with  the  optimal  control,  which  reveals  (1)  that 
the  dual  control  does  not  share  a  fundamental  property  of  the  optimal 
control,  the  uncertainty  threshold  principle;  (2)  that  the  dual  control 
approximates  the  optimal  control  linearly  in  the  covariances  of  the 
random  parameters,  for  small  values  of  the  parameter  covariances. 

Since  no  learning  cam  occur  (because  the  parameters  are  white-noise) , 
one  would  expect  the  probing  term  in  the  dual  cost  to  vanish.  This  is, 
however,  not  the  case.  Instead,  probing  term  and  caution  term  combine 
to  yield  a  positively  weighted  sum  of  the  one-step  predicted  covariances 
of  the  future  states.  This  observation  makes  one  doubt  the  usefulness 
of  the  splitting  between  caution  and  probing  terms  in  general,  as  w*>’  . 
as  their  intuitive  meaning. 

Also,  alternative  decompositions  of  the  dual  cost  should  be  in¬ 
vestigated.  Ideally,  one  term  should  correspond  to  the  certainty- 
equivalent  control  law  (this  is  accomplished  by  the  deterministic  term) ; 
smother  term,  to  the  open-loop  feedback  law,  and  the  remaining  term  would 
account  for  the  learning  characteristics  of  the  algorithm. 
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9.  Appendix 

We  shall  establish  equations  (5.14)  and  (5.16). 

1.  Proof  of  Equation  (5,14) 

Equation  (3.7)  cam  be  described  abstractly  as 

F (K, E)  =  0  (A.l) 

Hence,  if  K(E)  denotes  the  positive  solution,  the  following  is  an  identity 
in  E_: 

F[K(E),I]  =  0  (A. 2) 

Upon  differentiating  (A. 2)  with  respect  to  £  about  I_ *  0,  one  obtains: 

/ 3F\  T  / 3f\  /3f  \  T 

o  \3k/o  )  o 


=  0 


(A. 3) 


whence 


/3K\  T  _  VLI 

tail  o  —  *  ~  /3f\ 

•  r) 


3f  \  r  _ 
57  oi 


The  numerator  and  the  denominator  in  (A. 4)  are  now  calculated.  Prom 


(3.7), 


„  K2(I  .+  a  b)2 

F(K,Z)  *  Q  +  K(Z  +  a  2  ) _ _ 

dd  _ .  9 

R  +  K(Zbb  +  bJ) 


Denominator  of  (A. 4) 


<K,I) 


_  i  _  K(Iab  +  a  b)2[2R  *  2K<Zab  +  b2  )  -K(Zbb  +  b 2  )  ] 


2  +  T2 

aa 


(R  +  K(  bb  +  b2  )]2 


Therefore, 


I2  -1  -  K  a  2  b 2(K  b2  +  2R1 

(R  +  K  b  2  )  2 


a  2  .  i  _  a2  [(R  +  K  b2)2  -  R2) 
(R  +  K  b2  )2 


(R  +  K  b  2  ) 2 


1  =■  A  -  1 


where  the  last  equality  results  from  (5.4),  (5.5). 


Numerator  of  (A. 4) 

(sr)  *  (srj  o  Eaa  +  (tfj  o  rab  *  (H^J 


o  Sbb 


(A. 7) 


On  the  other  hand,  the  dual  law  is  expressed  by  (5.10  as 


A  +  B[K(0)  +  K' (0)TE1  +  cTZK(0) 

u  (Z)  •  - = - a -  *  (A.  13) 

A  +  B  [K(0)  +  K’ (0)  Z]  +  c  EK(O) 


where 


K'  (0) 


A  (3K 


3Z/o 


with  the  same  coefficients  A,  B,  c_,  A^ ,  B^,  c^  as  in  (A. 12).  Comparison 
of  (A. 12)  and  (A. 13)  shows  that,  in  (A. 13),  both  the  numerator  and 
the  denominator  of  (A. 12)  have  been  replaced  by  their  first  expansion 
in  about  0.  It  follows  that  (A. 12)  and  (A. 13)  have  the  same  first- 
order  expansion  in  E_  about  Z_  »  0. 
m  effect. 


UOPT(0) 


V0) 


A  +  B  K (0) 
Aj_  +  B^K  (0) 


and 


[Ax  +  B1K(0) J  tBK* (0)  +  K(0)cT] 
[Ax  +  B^CO)]2 

[A  +  BK(0)] [BjK/ (0)  +  K(0)cj] 


x 


X 


[A.  +  B  K  (0)  ] 


